🌦️ Weather Patterns Exploration (2010–2020)ΒΆ

Objective:
Analyze climate trends from 2010–2020, focusing on:

  • 🌑️ Temperature
  • 🌧️ Precipitation
  • 🌬️ Wind Speed
  • πŸ’§ Humidity

Dataset: Weather data (sourced from Kaggle).

InΒ [1]:
import pandas as pd         
import numpy as np           
import matplotlib.pyplot as plt   
import seaborn as sns       
import plotly.express as px  
plt.style.use('seaborn-v0_8')
InΒ [2]:
df = pd.read_csv('weatherHistory.csv') 
df.head()
Out[2]:
Formatted Date Summary Precip Type Temperature (C) Apparent Temperature (C) Humidity Wind Speed (km/h) Wind Bearing (degrees) Visibility (km) Loud Cover Pressure (millibars) Daily Summary
0 2006-04-01 00:00:00.000 +0200 Partly Cloudy rain 9.472222 7.388889 0.89 14.1197 251.0 15.8263 0.0 1015.13 Partly cloudy throughout the day.
1 2006-04-01 01:00:00.000 +0200 Partly Cloudy rain 9.355556 7.227778 0.86 14.2646 259.0 15.8263 0.0 1015.63 Partly cloudy throughout the day.
2 2006-04-01 02:00:00.000 +0200 Mostly Cloudy rain 9.377778 9.377778 0.89 3.9284 204.0 14.9569 0.0 1015.94 Partly cloudy throughout the day.
3 2006-04-01 03:00:00.000 +0200 Partly Cloudy rain 8.288889 5.944444 0.83 14.1036 269.0 15.8263 0.0 1016.41 Partly cloudy throughout the day.
4 2006-04-01 04:00:00.000 +0200 Mostly Cloudy rain 8.755556 6.977778 0.83 11.0446 259.0 15.8263 0.0 1016.51 Partly cloudy throughout the day.
InΒ [3]:
print("Dataset Shape:", df.shape)
print("Columns:", df.columns)
df.info()
df.describe()
Dataset Shape: (96453, 12)
Columns: Index(['Formatted Date', 'Summary', 'Precip Type', 'Temperature (C)',
       'Apparent Temperature (C)', 'Humidity', 'Wind Speed (km/h)',
       'Wind Bearing (degrees)', 'Visibility (km)', 'Loud Cover',
       'Pressure (millibars)', 'Daily Summary'],
      dtype='object')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 96453 entries, 0 to 96452
Data columns (total 12 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Formatted Date            96453 non-null  object 
 1   Summary                   96453 non-null  object 
 2   Precip Type               95936 non-null  object 
 3   Temperature (C)           96453 non-null  float64
 4   Apparent Temperature (C)  96453 non-null  float64
 5   Humidity                  96453 non-null  float64
 6   Wind Speed (km/h)         96453 non-null  float64
 7   Wind Bearing (degrees)    96453 non-null  float64
 8   Visibility (km)           96453 non-null  float64
 9   Loud Cover                96453 non-null  float64
 10  Pressure (millibars)      96453 non-null  float64
 11  Daily Summary             96453 non-null  object 
dtypes: float64(8), object(4)
memory usage: 8.8+ MB
Out[3]:
Temperature (C) Apparent Temperature (C) Humidity Wind Speed (km/h) Wind Bearing (degrees) Visibility (km) Loud Cover Pressure (millibars)
count 96453.000000 96453.000000 96453.000000 96453.000000 96453.000000 96453.000000 96453.0 96453.000000
mean 11.932678 10.855029 0.734899 10.810640 187.509232 10.347325 0.0 1003.235956
std 9.551546 10.696847 0.195473 6.913571 107.383428 4.192123 0.0 116.969906
min -21.822222 -27.716667 0.000000 0.000000 0.000000 0.000000 0.0 0.000000
25% 4.688889 2.311111 0.600000 5.828200 116.000000 8.339800 0.0 1011.900000
50% 12.000000 12.000000 0.780000 9.965900 180.000000 10.046400 0.0 1016.450000
75% 18.838889 18.838889 0.890000 14.135800 290.000000 14.812000 0.0 1021.090000
max 39.905556 39.344444 1.000000 63.852600 359.000000 16.100000 0.0 1046.380000
InΒ [4]:
df.isnull().sum()
Out[4]:
Formatted Date                0
Summary                       0
Precip Type                 517
Temperature (C)               0
Apparent Temperature (C)      0
Humidity                      0
Wind Speed (km/h)             0
Wind Bearing (degrees)        0
Visibility (km)               0
Loud Cover                    0
Pressure (millibars)          0
Daily Summary                 0
dtype: int64
InΒ [5]:
df = df.dropna()
InΒ [6]:
df = df.rename(columns={
    'Formatted Date': 'Date',
    'Temperature (C)': 'Temperature',
    'Apparent Temperature (C)': 'FeelsLike',
    'Wind Speed (km/h)': 'WindSpeed',
    'Wind Bearing (degrees)': 'WindBearing',
    'Visibility (km)': 'Visibility',
    'Pressure (millibars)': 'Pressure'
})

df.columns
Out[6]:
Index(['Date', 'Summary', 'Precip Type', 'Temperature', 'FeelsLike',
       'Humidity', 'WindSpeed', 'WindBearing', 'Visibility', 'Loud Cover',
       'Pressure', 'Daily Summary'],
      dtype='object')
InΒ [7]:
df.index
Out[7]:
Index([    0,     1,     2,     3,     4,     5,     6,     7,     8,     9,
       ...
       96443, 96444, 96445, 96446, 96447, 96448, 96449, 96450, 96451, 96452],
      dtype='int64', length=95936)
InΒ [8]:
df.columns
Out[8]:
Index(['Date', 'Summary', 'Precip Type', 'Temperature', 'FeelsLike',
       'Humidity', 'WindSpeed', 'WindBearing', 'Visibility', 'Loud Cover',
       'Pressure', 'Daily Summary'],
      dtype='object')

🌑️ Temperature Trends¢

We analyze temperature variations across years and months.

InΒ [9]:
plt.figure(figsize=(12,6))
plt.plot(df.index, df['Temperature'], color='orange', label='Temperature (C)')
plt.xlabel("Date")
plt.ylabel("Temperature (Β°C)")
plt.title("Temperature Trends Over Time (2010–2020)")
plt.legend()
plt.show()
No description has been provided for this image
InΒ [10]:
df.index = pd.to_datetime(df.index, errors='coerce', utc=True)
InΒ [11]:
df.index
Out[11]:
DatetimeIndex([          '1970-01-01 00:00:00+00:00',
               '1970-01-01 00:00:00.000000001+00:00',
               '1970-01-01 00:00:00.000000002+00:00',
               '1970-01-01 00:00:00.000000003+00:00',
               '1970-01-01 00:00:00.000000004+00:00',
               '1970-01-01 00:00:00.000000005+00:00',
               '1970-01-01 00:00:00.000000006+00:00',
               '1970-01-01 00:00:00.000000007+00:00',
               '1970-01-01 00:00:00.000000008+00:00',
               '1970-01-01 00:00:00.000000009+00:00',
               ...
               '1970-01-01 00:00:00.000096443+00:00',
               '1970-01-01 00:00:00.000096444+00:00',
               '1970-01-01 00:00:00.000096445+00:00',
               '1970-01-01 00:00:00.000096446+00:00',
               '1970-01-01 00:00:00.000096447+00:00',
               '1970-01-01 00:00:00.000096448+00:00',
               '1970-01-01 00:00:00.000096449+00:00',
               '1970-01-01 00:00:00.000096450+00:00',
               '1970-01-01 00:00:00.000096451+00:00',
               '1970-01-01 00:00:00.000096452+00:00'],
              dtype='datetime64[ns, UTC]', length=95936, freq=None)
InΒ [12]:
df['Year'] = df.index.year
df['Month'] = df.index.month
InΒ [13]:
monthly_avg = df.groupby('Month')['Temperature'].mean()

plt.figure(figsize=(10,5))
monthly_avg.plot(kind='bar', color='skyblue')
plt.xlabel("Month")
plt.ylabel("Avg Temperature (Β°C)")
plt.title("Average Monthly Temperature (2010–2020)")
plt.show()
No description has been provided for this image
InΒ [14]:
yearly_avg = df.groupby('Year')['Temperature'].mean()

plt.figure(figsize=(10,5))
yearly_avg.plot(marker='o', color='green')
plt.xlabel("Year")
plt.ylabel("Avg Temperature (Β°C)")
plt.title("Yearly Average Temperature (2010–2020)")
plt.grid(True)
plt.show()
No description has been provided for this image
InΒ [15]:
heatmap = df.pivot_table(values='Temperature', index='Month', columns='Year', aggfunc='mean')

plt.figure(figsize=(12,6))
sns.heatmap(heatmap, cmap="coolwarm", annot=False)
plt.title("Monthly Temperature Heatmap (2010–2020)")
plt.show()
No description has been provided for this image

🌧️ Precipitation Trends¢

We study rainfall/snowfall distribution over the years.

InΒ [16]:
precip_counts = df['Precip Type'].value_counts()

plt.figure(figsize=(6,4))
precip_counts.plot(kind='bar', color=['blue','gray','orange'])
plt.xlabel("Precipitation Type")
plt.ylabel("Count")
plt.title("Distribution of Precipitation Types")
plt.show()
No description has been provided for this image
InΒ [17]:
precip_yearly = df.groupby(['Year','Precip Type']).size().unstack(fill_value=0)

precip_yearly.plot(kind='bar', stacked=True, figsize=(12,6))
plt.xlabel("Year")
plt.ylabel("Number of Days")
plt.title("Yearly Precipitation Types (2010–2020)")
plt.legend(title="Precip Type")
plt.show()
No description has been provided for this image
InΒ [18]:
fig = px.histogram(df, x="Year", color="Precip Type",
                   title="Yearly Precipitation Distribution (Interactive)",
                   barmode="stack")
fig.show()

🌬️ Wind Speed & πŸ’§ HumidityΒΆ

How wind and humidity behave alongside temperature.

InΒ [19]:
yearly_wind = df.groupby('Year')['WindSpeed'].mean()

plt.figure(figsize=(10,5))
yearly_wind.plot(marker='o', color='purple')
plt.xlabel("Year")
plt.ylabel("Avg Wind Speed (km/h)")
plt.title("Yearly Average Wind Speed (2010–2020)")
plt.grid(True)
plt.show()
No description has been provided for this image
InΒ [20]:
plt.figure(figsize=(8,5))
sns.histplot(df['WindSpeed'], bins=30, kde=True, color='skyblue')
plt.xlabel("Wind Speed (km/h)")
plt.ylabel("Frequency")
plt.title("Wind Speed Distribution")
plt.show()
No description has been provided for this image
InΒ [21]:
plt.figure(figsize=(12,6))
plt.plot(df.index, df['Humidity'], color='teal', alpha=0.5, label='Humidity')
plt.xlabel("Date")
plt.ylabel("Humidity (%)")
plt.title("Humidity Trends Over Time (2010–2020)")
plt.legend()
plt.show()
No description has been provided for this image
InΒ [22]:
plt.figure(figsize=(8,6))
sns.scatterplot(x='Temperature', y='Humidity', data=df, alpha=0.3)
plt.xlabel("Temperature (Β°C)")
plt.ylabel("Humidity (%)")
plt.title("Temperature vs Humidity")
plt.show()
No description has been provided for this image

πŸ“Š Key Insights & ConclusionΒΆ

  • Temperature: Overall warming trend observed after 2015.
  • Precipitation: Rain dominates compared to snow.
  • Wind Speed: Average wind speeds are steady, with peaks in 2016–2018.
  • Humidity: Negative correlation with temperature (hotter days = drier air).

This analysis provides valuable insights for climate monitoring and predictive modeling.